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Abstract — A fast matching pursuit method using a Bayesian 
approach is introduced for sparse signal recovery. This method, 
referred to as nGpFBMP, performs Bayesian estimates of sparse 
signals even when the signal prior is non-Gaussian or unknown. 
It is agnostic on signal statistics and utilizes a priori statistics of 
additive noise and the sparsity rate of the signal, which are shown 
to be easily estimated from data if not available. nGpFBMP 
utilizes a greedy approach and order-recursive updates of its 
metrics to find the most dominant sparse supports to determine 
the approximate minimum mean square error (MMSE) estimate 
of the sparse signal. Simulation results demonstrate the power 
and robustness of our proposed estimator. 

Index Terms — sparse reconstruction, compressed sensing, 
Bayesian, linear regression, matching pursuit, basis selection, 
minimum mean-square error (MMSE) estimate, greedy algo- 
rithm 



I. Introduction 

SPARSITY is a feature that is abundant in both natural 
and man-made signals. Some examples of sparse signals 
include those from speech, images, videos, sensor arrays 
(e.g., temperature and light sensors), seismic activity, galactic 
activities, biometric activity, radiology, and frequency hopping. 
Given the vast existence of signals, their sparsity is an attrac- 
tive property because the exploitation of this sparsity may be 
useful in the development of simple signal processing systems. 
Some examples of systems in which a priori knowledge of 
signal sparsity is utilized include motion estimation |1|, mag- 
netic resonance imaging (MRI) Q, impulse noise estimation 
and cancellation in DSL (3), network tomography [4], and 
peak-to-average-power ratio reduction in OFDM |]5]. All of 
these systems are based on sparsity-aware estimators such as 
Lasso [6], basis pursuit [7|, structure -based estimator [8], fast 
Bayesian matching pursuit [9], and estimators related to the 
relatively new area of compressed sensing [ 10"] — [12]. 

Compressed sensing (CS), otherwise known as compressive 
sampling, has found many applications in the fields of commu- 
nications, image processing, medical imaging, and networking. 
CS algorithms have been shown to recover sparse signals from 
underdetermined systems of equations that take the form 



where x 6 C , and y € C are the unknown sparse 
signal and the observed signal, respectively. Furthermore, 



y = 4>x + n 



(1) 
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iM x N 



is the measurement matrix and n G 



is the 



additive Gaussian noise vector. Here, the number of unknown 
elements, N, is much larger than the number of observations, 
M. CS uses linear projections of sparse signals that preserve 
structure of signals; furthermore, these projections are used to 
reconstruct the sparse signal using ^-optimization with high 
probability. 



x = argmin ||x||, such that <&x = y 



(2) 



li -optimization is a convex optimization problem that conve- 
niently reduces to a linear program known as basis pursuit, 
which has the computational complexity of 0(N 3 ). Since, 
usually, N is large, such an approach rapidly becomes unreal- 
istic. To tackle this problem, some efficient alternatives such 
as orthogonal matching pursuit (OMP) [ 1 3 1 and the algorithm 
proposed by Haupt et al. ifPfl have been proposed. These 
algorithms fall into the category of greedy algorithms that 
are relatively faster than basis pursuit. However, an inherent 
problem in these systems is that the only a priori information 
utilized is the sparsity information. 

Another category of methods based on the Bayesian ap- 
proach considers complete a priori statistical information 
of sparse signals. A method called fast Bayesian matching 
pursuit (FBMP) [9|, adopts such an approach and assumes a 
Gaussian prior on the unknown sparse vector, x. This method 
performs sparse signal estimation via model selection and 
model averaging. The sparse vector is described as a mixture 
of several components, the selection of which is based on 
successive model expansion. FBMP obtains an approximate 
MMSE estimate of the unknown signal with high accuracy and 
low complexity. It was shown to outperform several sparse re- 
covery algorithms, including OMP 03), StOMP fl3), GPSR- 
Basic lfl6l . Sparse Bayes lfl7l . BCS lfl8l and a variational- 
Bayes implementation of BCS lfl9l . However, there are several 
drawbacks associated with this method. The assumption that 
the signal prior is Gaussian is not realistic, because, in most 
real-world scenarios, the signal distribution is not Gaussian, 
or it is unknown. In addition, its performance is dependent on 
the knowledge of signal statistics, which are usually difficult 
to compute. Although, a signal statistics estimation process is 
proposed, it is dependent on knowledge of the initial estimates 
of these signal parameters. The estimation process, in turn, has 
a negative impact on the complexity of the method. 

Another popular Bayesian method proposed by Larsson and 
Selen ll20l computes the MMSE estimate of the unknown 
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vector, x. Its approach is similar to that of FBMP in the 
sense that the sparse vector is described as a mixture of 
several components that are selected based on successive 
model reduction. It also requires knowledge of the noise 
and signal statistics. However, it was found that the MMSE 
estimate is insensitive to the a priori parameters and therefore 
an empirical-Bayesian variant that does not require any a 
priori knowledge of the data was devised. 

The Bayesian approaches mentioned above work success- 
fully only for Gaussian priors. It is reasonable to assume that 
any additive noise, generated at the sensing end, is Gaussian. 
However, assuming the signal statistics to be Gaussian is inad- 
equate, because the actual situation is not captured. Moreover, 
in situations where the assumption of a Gaussian prior is 
valid, the parameters of that prior (mean and covariance) need 
to be estimated, which is challenging, especially when the 
signal statistics are not i.i.d. In that respect, one can appreciate 
convex relaxation approaches that are agnostic with regard to 
signal statistics. 

In this paper, we pursue a Bayesian approach for sparse 
signal reconstruction that combines the advantages of the 
two approaches. On one hand, the approach is Bayesian, 
acknowledging the noise statistics and the signal sparsity rate, 
while on the other hand, the approach is agnostic on the signal 
amplitude statistics. The approach can bootstrap itself and 
estimate the required statistics (sparsity rate and noise vari- 
ance) when they are unknown. The algorithm is implemented 
in a greedy manner and pursues an order-recursive approach, 
helping it to enjoy low complexity. Specifically, the advantages 
of our approach are as follows 

1) The Bayesian estimate of the sparse signal is performed 
even when the signal prior is non-Gaussian or unknown. 

2) Signals whose statistics are unknown are dealt with. 
In fact, contrary to other methods, these statistics need 
not be estimated, which is particularly useful when the 
signal statistics are not i.i.d. Therefore, it is agnostic 
with regard to variations in signal statistics. 

3) The a priori statistics of the additive noise and the 
sparsity rate of the signal, which can be easily estimated 
from the data if not available, are utilized. 

4) The greedy nature of the approach and the order- 
recursive update of its metrics, make it simple. 

The remainder of this paper is organized as follows. In 
Section (TTJ we formulate the problem and present the MMSE 
setup in the non-Gaussian/unknown statistics case. In Section 
Hill we describe our greedy algorithm that is able to obtain 
the approximate MMSE estimate of the sparse vector. Section 
[IV] demonstrates how the greedy algorithm can be made faster 
by calculating various metrics in a recursive manner. This is 
followed by Section [V] which describes our hyperparameter 
estimation process. In Section |VI] we present our simulation 
results and in Section IVIII we conclude the paper. 

A. Notation 

We denote scalars with small-case letters (e.g., x), vectors 
with small-case, bold-face letters (e.g., x), matrices with 



upper-case, bold-face letters (e.g., X), and we reserve calli- 
graphic notation (e.g., S) for sets. We use x$ to denote the 
i th column of matrix X, x(j) to denote the j th entry of 
vector x, and <Sj to denote a subset of a set S. We also 
use Xs to denote the sub-matrix formed by the columns 
{xi : i £ S}, indexed by set S. Finally, we use x, x*, x T , and 
x H to respectively denote the estimate, conjugate, transpose, 
and conjugate transpose of the vector x. 

II. Problem Formulation and MMSE Setup 

A. The Signal Model 

The analysis in this paper considers obtaining an N x 1 
sparse vector, x, from an M x 1 observations vector, y. These 
observations obey the linear regression model 

y = <&x + n (3) 

where $ is a known M x N regression matrix and n ~ 
£/V(0,K n ) is the additive Gaussian noise vector. 

We shall assume that x has a sparse structure and is modeled 
as x = s.a o xb where o indicates Hadamard (element-by- 
element) multiplication. The vector x^ consists of elements 
that are drawn from some unknown distribution and the entries 
of xg are drawn i.i.d. from a Bernoulli distribution with 
success probability p. We observe that the sparsity of vector 
x is controlled by p and, therefore, we call it the sparsity 
parameter/rate. Typically, in Bayesian estimation, the signal 
entries are assumed to be drawn from a Gaussian distribution 
but here we would like to emphasize that the distribution and 
color of the entries of x^ do not matterQ 

B. MMSE Estimation of x 

To determine x, we compute the MMSE estimate of x given 
observation y. This estimate is formally defined by 

x mmse 4 E[x|y] = p(«S|y)E[x|y, S] (4) 

where the sum is executed over all possible 2^ support sets of 
x. In the following, we explain how the expectation E[x|y, <S], 
the posterior p(S\y) and the sum in can be evaluated. 
Given the support S, © becomes 

y = * 5 x s + n (5) 

where is a matrix formed by selecting columns of *& 
indexed by support S. Similarly, X5 is formed by selecting 
entries of x indexed by S. Since the distribution of x is 
unknown, the best we can do is to use the best linear unbiased 
estimate (BLUE) as an estimate of x, i.e., 

E[x|y,5] = (*S*s) -1 *3y < 6 > 
The posterior in (0} can be written using the Bayes rule as 

p(S|y) - MO (7) 
p{y) 

'The distribution may be unknown or known with unknown parameters or 
even Gaussian. Our developments are agnostic with regard to signal statistics. 
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The probability, p(y), is a factor common to all posterior 
probabilities that appear inland hence can be ignored. Since 
the elements of x are activated according to the Bernoulli 
distribution with success probability p, we have 



(8) 



It remains to evaluate the likelihood p(y\S). If xg is Gaussian, 
p(y\S) would also be Gaussian and that is easy to evaluate. 
On the other hand, when the distribution of x is unknown 
or even when it is known but non-Gaussian, determining 
p(y\S) is in general very difficult. To go around this, we 
note that y is formed by a vector in the subspace spanned 
by the columns of $5 plus a Gaussian noise vector, n. This 
motivates us to eliminate the non-Gaussian component by 
projecting y onto the orthogonal complement space of <f>$. 
This is done by multiplying y by the projection matrix P^ = 

I * 5 (*5*s) This leaves us with P^y = P^n, 

which is Gaussian with a zero mean and covariance 

K = E[(P»(P^n) H ] 



P^E[nn H ]P 



P^KnP^ 



H 



(9) 



where K n is the noise covariance matrix. Thus, 

1 ( 1 

2 



v{y\s) 



expf-^P^K^y 



1 /(2 7 r) M detK 

Dropping the pre-exponential factor yields 

p(y\S) oc exp (- ||Psy||K 

In the white noise case, n ~ Af(0, c^I), which we will focus 
on for the remainder of the paper, we have 



(10) 



(11) 



p(y\S) ~ exp 



1 



l p 5y| 



Substituting ([8]) and dT2b into (O finally yields an expression 
for the posterior probability. In this way, we have all the 
ingredients to compute the sum in (0J. Computing this sum 
is a challenging task when N is large because the number 
of support sets can be extremely large and the computational 
complexity can become unrealistic. To have a computationally 
feasible solution, this sum can be computed over a few support 
sets corresponding to significant posteriors. Let S d be the set 
of supports for which the posteriors are significant. Hence, we 
arrive at the following approximation to the MMSE estimate 



= ^p( l S d |y)E[x|y,5 d 



(13) 
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In the next section, we propose a greedy algorithm to find S d . 
Before proceeding, for ease of representation and convenience, 
we represent the posteriors in the log domain. In this regard, 
we define a dominant support selection metric, v(S), to be 



used by the greedy algorithm as follows: 

1/(5) 4 \np(y\S)p(S) 

= In exp (~ ||Psy|| 2 ) + ln (p'^'C 1 - P) 

n 

+ \S\lnp+(N -\S\)ln(l-p) 



(14) 



III. A Greedy Algorithm 



We now present a greedy algorithm to determine the set 
of dominant supports, S d , required to evaluate x ommse ( TT~3T >. 
We search for the optimal support in a greedy manner. We 
first start by finding the best support of size 1, which involves 
evaluating u(S) for S = {1}, . . . , {N}, i.e., a total of (^) 
search points. Let Si = {i*} be the optimal support. Now, 
we look for the optimal support of size 2, which involves a 
search of size ( 2 ). To reduce the search space, we pursue 
a greedy approach and look for the point i\ ^ i\ such that 
S2 = {ii!^} maximizes v{Si). This involves ( -T 1 ) search 
points (as opposed to the optimal search over ( 2 ) points). 
We continue in this manner by forming 1S3 = and 
searching for i\ in the remaining N — 2 points and so on until 
we reach Sp = . . . , ip}. The value of P is selected to be 
slightly larger than the expected number of nonzero elements 
in the constructed signal such that Pr(|<S| > P) is sufficiently 
smalfl A formal algorithmic description is presented in Table 
U and an example run of this algorithm for N — 7 and P = 4 
is presented in Fig. [TJ 

One point to note here is that in our greedy move from 
Sj to Sj+i, we need to evaluate v{Sj U {ij+i}) around N 
times, which can be done in an order-recursive manner starting 
from that of v{Sj). Specifically, we note that every expansion, 
Sj U {ij+i}, from Sj requires a calculation of v{Sj U {ij+i}) 
^ ^ from (TBI . This translates to appending a column, 7+1 , to 



#5 in the calculations of (T14l i. which can be done in an order- 
recursive manner. We summarize these calculations in Section 
HVl This order-recursive approach reduces the calculation in 
each search step to an order of 0(MN) operations down from 
0(M N 2 ) in the direct approach. Therefore, the complexity we 
incur is of the order O(PMN) in our greedy search for the 
best P support. 

A. A Repeated Greedy Search 

It is obvious that to achieve the best signal estimate, S d 
should contain all 2 N possibilities of supports. However, our 
greedy algorithm would result in an approximation of the 
signal estimate because it selects only a handful of supports. 
The accuracy of the reconstructed signal is dependent on the 
number of support vectors in S d and may be increased by 
repeating the greedy algorithm a number of times (e.g., D). 
This would result in S d with D number of support vectors for 

2 |cS|, i.e., support of the constructed signal, follows the binomial distri- 
bution B(N,p), which can be approximated by the Gaussian distribution 
Af(Np,Np(l - p)) if Np > 5. For this case, PrflSI > P) = 

ierfc . 

2 v '2JVp(l-p) 
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TABLE I: The Greedy Algorithm 



1) Initialize L = {1, 2, . . . , N}, S max = {}, S d = {}, 
i = 1, L, = L. 

2) If i > P, then stop. 

3) Generate = {S max U {ai\,S ma x U 

{a 2 }, ■ ' ■ ,Smax U {a|L,|} I "fc G i'i} 

4) Compute {is{S k ) \ S k G fi}. 

5) Find 5* G £1, such that v(S*) > maxj v(Sj). 

6) Update, <S d = {<S d ,S*}, 5 ma:c = S\ L i+1 = 
L \ S*. 

7) Set i i + 1 and repeat steps |2] - Q 
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89.76 












{2} 


12.29 




{5, 1, 2} 


70.2 










{5, 2} 


89.23 




{5, 1, 3, 2} 


110.76 
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12.11 




{5, 1, 3} 


70.99 








{5,3} 


89.54 




{5, 1, 3, 4} 


111.09 




{4} 


12.71 




{5, 1, 4} 


70.89 








{5,4} 


89.52 




{5, 1, 3, 6} 


110.89 




{5} 


12.89 




{5, 1, 6} 


70.01 








{5,6} 


89.01 




{5, 1, 3, 7} 


110.24 




{6} 


12.32 




{5, 1, 7} 


70.7 










{5,7} 


89.61 












{7} 


12.81 
















{51 
{[5]} 

£j = {1, 2, 3, 4, 
6,7) 


{5,1} 

[5, 1]} 

tj = {2,3,4, 6,7} 


{5,1,3} 

{[51, [5, 1], [5, 1, 3]} 

t, = {2, 4, 6, 7} 


{5,1,3,6} 

{[5], [5,1], {5, 1,3], [5,1, 
3,61} 

Is = R4, 7} 



Fig. 1: An example run of the greedy algorithm for N 
and P = 4 



IV. Efficient Computation of the Dominant 
Support Selection Metric 

As explained in SectionUll] v(S) requires extensive compu- 
tation to determine the dominant supports. The computational 
complexity of the proposed algorithm is therefore largely 
dependent upon the way v{S) is computed. In this section, a 
computationally efficient procedure to calculate this quantity 
is presented. 

We note that the efficient computation of v(S) depends 
mainly on the efficient computation of the term £s = 

K^s^^)" 1 *^)! 2 = ||* 5 E[x|y ;l S]|t 2 . Our focus is 
therefore on computing E[x|y, <S] efficiently. 

Consider the general support S = {si, s 2 , S3, ... , s k } with 
si < s 2 < ■ ■ ■ < s k and let S and S denote the following 
subset and superset, respectively: 



S = {si, s 2 , S3, • 



■ , Sfe. 



-1} S = {si, s 2 , s 3 , . . . , Sk+i} 



(15) 



where s k < s^+i ■ in tne following, we demonstrate how to 
update e y , fc _i(5) 4 E[xg] y] to obtairfl e y , fc (<S) = E[x s | y]. 
Here, we use S to designate the supports and thus E[xs|y] 
refers to E[x|y, <S]. We note that 



*sy = 



(16) 



By using the block inversion formula to express the inverse of 
the above and simplifying, we get 



each of the P sparsity levels (i.e., a total of PD supports). The 
selection of supports in subsequent repetitions of the algorithm 
is performed by making sure not to select an element at a 
particular sparsity level that has been selected at the same 
sparsity level in any of the previous repetitions. We note 
that a repeated greedy search in this manner would incur a 
complexity of order 0(DPM N). For a detailed description of 
the steps followed by the method, the pseudocode is provided 
in the Appendix and the nGpFBMP code is provided on the 
author's websit^l 



Remark 

Let and 5 - 2 ' be two different support vectors at the jth 



sparsity level. Note from (Q3J that if v(S^) > v{Sy) for 
some a\ and p, then that inequality remains valid regardless 
of how a\ and p change. In other words, the selection 
of dominant supports at each sparsity level is independent 
of these quantities. This observation helps the algorithm to 
bootstrap itself and to estimate the unknown sparsity rate and 
noise variance, as demonstrated ahead. 



e y ,/c(S) = 



77 (q£,*X 5 ) e y,fc-i(-£) - e y ,i(s fe )) e*,*(S) 

+ e y,k-l(S) 



7 (q0,fc( 5 ) e y,fc-i(£) - e yA s k 



fs 



(17) 



This recursion is initialized by e y i(i) = (^fi^(p s J ^"y for 

i = 1,2, . . . , N. The recursion also depends on k (S) = 
^ Sk , e ,*(S) 4 (fcg**)- 1 *]^ and f s 4 1 - 
l4,,k( S ) e 4'A s )- The recursions for q^ fc (5), and e^^S) 
may be determined in a similar fashion as 



7J (q0,fc(>5) e 0,fc(-5; Sfc+i) - e , 2 (s fe ; s k+1 ) 



7s (q0,fc( 5 ) e <?2,fc(5;s fc+ i) - e0 j2 (sA;;s fe+ i) 

(18) 



an 



d2 



q0,fc+i(<5) = 



FT 







&s k+1 = 



(19) 



3 The MATLAB code of the nGpFBMP algorithm and the results 
from various experiments discussed in this paper are provided at 
http://faculty.kfupm.edu.sa/ee/naffouri/publications.html 



4 We explicitly indicate the size k of 5 in this notation as it elucidates the 
recursive nature of the developed algorithms. 

5 Notation such as e<^ fe(5; s^_|_i) is a short hand for e<^ j.(iS U 
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The two recursions ([18} and ([19} start at k — 2 and are 
thus initialized by e,p^(si; S2) and q^^si; S2) for si,S2 = 
1,2,..., TV. This completes the recursion of e y fc(5) which 
we utilize for recursive evaluation of v(S) as 



^.(S) = ^||* 5 e y , fc (S)|| 2 --^||y|| 2 
+ (N-\S\)ln(l-p) 



\S\lnp 



(20) 



V. Estimation of the hyperparameters p and a\ 

One of the advantages of the proposed nGpFBMP is that it 
is agnostic with regard to signal statistics; the only parameters 
required are the noise variance, cr„, and the sparsity rate, 
p. In Section [Till we pointed out that the dominant support 
selection process is independent of parameters p and er^. We 
are therefore able to determine the supports irrespective of 
the initial estimates of these parameters. The independence 
of the dominant support selection process from p and a\ 
allows accurate and rapid estimation of these parameters. 
We note that although p and ct„ are not needed for support 
calculations, they are required for computing the posteriors 
used in the calculation of x ammse in (fT3l . To determine these 
estimates, we might opt for finding the maximum-likelihood 
(ML) or maximum a posteriori (MAP) estimates using the 
expectation maximization (EM) algorithm. However, this will 
add to the computational complexity and is unnecessary as a 
fairly accurate estimation could be performed in a very simple 
manner as follows. 

The maximum a posteriori (MAP) estimate of support S 
is given by S map = argmax 5 p(5|y). We use this to get the 

This X ma p is 



MAP estimate of x, i.e., x„ mp = E[x|y, <S, 



J map\ 



in turn used to estimate p, iteratively, as follows: 



P 



IN 



(21) 



Here, the superscripts refer to a particular iteration. The 
estimate is computed iteratively where in the first iteration 
of nGpFBMP, p 1 - 1 ^ is initialized by pimt, the given initial 



estimate, to compute x 



(i) 

map- 



This is used to find the new 



estimate, p^ 2 \ using (fjjji which is then supplied to nGpFBMP 

(2) 

in the next iteration to compute jc ma p- This process is repeated 
until the estimate of p changes by less than 2% or until 
a predetermined maximum number of iterations has been 
performed. Simulation results show that, in most cases, p 
converges rapidly. At this stage, the estimate of the noise 
variance can be computed as follows: 

-2 



a n = var(y — <frx 



map } 



(22) 



We note that we do not need any iteration for estimating the 

noise variance. We use the x map corresponding to the final 

-2 



estimate, p, to find <t„. 



VI. Results 

To demonstrate the performance of the proposed nGpFBMP, 
we compare it here with Fast Bayesian Matching Pursuit 
(FBMP) [9] and the convex relaxation-based (Zi) approach. 
The reason FBMP was selected is that it was shown to 



outperform a number of the contemporary sparse signal re- 
covery algorithms, including OMP 03], StOMP (TJl, GPSR- 
Basic JTH, and BCS JT8J- Comparison with FBMP shows 
that nGpFBMP performs where FBMP fails for various signal 
settings which are discussed in detail in the following. 

Signal Setup 

Experiments were conducted for signals drawn from Gaus- 
sian as well as non-Gaussian distributions. The following 
signal configurations were used for the experiments: 

1) Gaussian (i.i.d. ({i x = 10, rr x = 2)). 

2) Non-Gaussian (Uniform, non-i.i.d. (5 < ^ x < 10, 1 < 
ol < 2)). 

3) Unknown distribution (for this example, different images 
with unknown statistics were used), 

where /i x and <r x refer to the mean and variance of the 
corresponding distributions, respectively. 

Entries of M x N sensing/measurement matrix 3? were i.i.d., 
with zero means and complex Gaussian distribution where 
the columns were normalized to the unit norm. The size of 
<I> selected for the experiments was M = 256,7V = 1024. 
The noise had a zero mean and was white and Gaussian, 
CAf(0, ct^Iju), with a\ determined according to the desired 
signal-to-noise ratio (SNR). Initial estimates of the hyperpa- 
rameters used for the simulations were /i x est = 0' °x est = 
x er x , (7^ est = 10 x cr^, and p e st = 0.003, where estimates 
of the signal mean and variance were needed for FBMP. 

In all of the experiments, parameter refinement was per- 
formed for both nGpFBMP and FBMP. For FBMP, the surro- 
gate EM method proposed by its authors was used to refine 
the hyperparameters. The refinements were allowed to perform 
for a maximum of E max = 10 iterations. For fairness, support 
and amplitude refinement [ 3 1 procedures were performed on 
the results of the CS algorithmic Finally, the normalized mean- 
squared error (NMSE) between the original signal, x, and 
its MMSE estimate, x ammse , was used as the performance 
measure: 



e = 10 log 



10 



1 K 
~K 51 



k=l 



|Xfc - x fc | 
l|x fc || 2 



(23) 



where K was the number of trials performed to compute 
NMSE between x and its estimate. 

Experiment 1 ( Signal estimation performance comparison for 
varying SNR) 

In the first set of experiments, NMSEs were measured 
for values of SNR between dB and 30 dB and plotted 
to compare the performance of nGpFBMP with FBMP and 
the CS algorithm. The signal sparsity rate selected for these 
experiments was p — 0.005. 

Figs. [2a] and [2b] show that the proposed method has better 
NMSE performance than both FBMP and CS for all consid- 
ered signals. Only at very high values of SNR does the NMSE 
performance of FBMP approach that of nGpFBMP. 

6 Actual parameter values were provided to the CS algorithm instead of 
estimates; furthermore, support and amplitude refinement was also performed 
to demonstrate that, despite these measures, its performance was inferior to 
that of nGpFBMP. 
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Experiment 2 ( Signal estimation performance comparison for 
varying sparsity parameter p) 

In a similar set of experiments, NMSE and mean runtime 
were measured for different values of sparsity parameter p. 
The value of SNR selected for these experiments was 20 dB. 

Figs. |3] and |4] demonstrate the superiority of nGpFBMP 
over FBMP and CS. Runtime graphs of Figs. [3] and [4] depict 
that the runtime of nGpFBMP increases for higher values of 
p. This occurs because the initial estimate of p was 0.003, 
and as the sparsity rate of x increased, more iterations were 
required to estimate the value of p. With higher values of p, 
the difference in runtime is insignificant given the excellent 
NMSE performance of our method. We also observe that 
performance of nGpFBMP is relatively insensitive to changes 
in p as the corresponding changes in NMSE are very small, 
thus demonstrating the strength of the proposed algorithm. 

Experiment 3 ( Comparison of signal estimates when the initial 
statistics of signal and noise are very close to the actual 
values) 

Table HU compares the average NMSEs of FBMP and 
nGpFBMP for different types of signals when the initial 
estimates (p x , cr^, and cr£) were chosen to be very near to 
their actual values. Since nGpFBMP is independent of these 
initial estimates its performance did not change. On the other 
hand, performance of FBMP improved, although it did not 
outperform nGpFBMP. The results show the robustness of 
our algorithm as it is not dependent on the initial estimates 
of p x , tr^j an d a n- We note that each value in Table HT1 has 
been computed by averaging the results of 500 independent 
experiments. 

Experiment 4 ( Comparison of multiscale image recovery per- 
formance) 

In another experiment, we carried out multiscale recovery 
of different images that were 128 x 128 pixels. These images 
are shown in the first row of Fig. One-level Haar wavelet 
decomposition of these images was performed, resulting in 



TABLE II: Average NMSE comparison between FBMP and 
nGpFBMP when the initial estimates of the hyperparameters 
are close to the actual values. Values are in dB. 



Signal type 


FBMP 


nGpFBMP 


Gaussian 


-20.55 


-31.103 


Uniform (i.i.d.) 


-24.2 


-30.98 


Uniform (non-i.i.d.) 


-23.87 


-30 



TABLE III: NMSE (e) and Reconstruction time comparisons 
between FBMP and nGpFBMP for the test images shown in 
Fig.0 



Image 


FBMP 


nGpFBMP 


e(dB) 


Time (s) 


e (dB) 


Time (s) 


Mondrian 


-16.68 


19.36 


-18.84 


6.8 


Fingerprint 


-9.97 


19.24 


-13.92 


6.75 


Facets 


-14.75 


19.56 


-19.30 


7.01 



one 'approximation' (low-frequency) and three 'detail' (high- 
frequency) images. Unlike the approximation component, the 
detail components are compressible in nature. We, there- 
fore, generated their sparse versions by applying a suitable 
threshold; the noisy random measurements were acquired later 
from the threshold versions. The detail components were 
reconstructed from these measurements through nGpFBMP. 
Finally, inverse wavelet transform was applied to reconstruct 
the images from the recovered details and the original approx- 
imations. Reconstruction errors and times were recorded and, 
for comparison, recoveries were obtained using FBMP. The 
resulting reconstructed images are shown in Fig. Numerical 
details of the results for these experiments are given in Table 
Hill from which it is obvious that images reconstructed using 
nGpFBMP have lower NMSEs and require a significantly 
shorter reconstruction time than does FBMP. 
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NMSE vs p for non-iid Uniform input. M=256, N.1024, SNR.20dB, p1 .0.003 
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Runtime vs p for non-iid Uniform input. M.256, N.1024, SNR.20dB, p1 .0.003 
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Fig. 3: NMSE and average runtime vs p graphs for uniform non-i.i.d. input 



NMSE vs p for iid Gaussian input. M.256, N.1 024, SNR=20dB, p1 .0.003 



Runtime vs p for iid Gaussian input. M.256, N.1024, SNR.20dB, p1 .0.003 
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Fig. 4: NMSE and average runtime vs p graphs for Gaussian i.i.d. input 



VII. Conclusion 

In this paper, we presented a robust Bayesian matching 
pursuit algorithm based on a fast recursive method. Compared 
with other robust algorithms, our algorithm does not require 
signals to be derived from some known distribution. This 
is useful when we can not estimate the parameters of the 
signal distributions. Application of the proposed method on 
several different signal types demonstrated its superiority and 
robustness. 
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Appendix 

Algorithm: Non-Gaussian prior fast Bayesian matching pur- 
suit 

1) The following quantities are calculated one time and 
used in the following iterations 

a) inner products of y with columns of €>. 

b) inner products of columns of <I> with each other. 

c) e y ,i(i) = ~* $y for i = 1, 2, . . . , N i.e. 
E[x|y, S] for each of the one-length support vector. 
Note that the components of these calculations are 
available from steps [Ta] and [Tb] 

2) Calculate N one-element metrics, v(Si), using e y> i 
from the previous step for each of the possible support 
elements oti,i G [1,N] as explained in Section iHll 

3) Perform the following search D times (each search has 
p = 1 : P stages) 

a) At each pth stage: 

i) Find the p-element metric, v(S p ), with the 
maximum value and note down the correspond- 
ing a,-. 

ii) If this on has been selected in any of the 
previous pth stages, mark it and discard it and 
go to step |3(a)i| again. 

iii) else update e^p using ( TT8b for this newly 
selected position, on, and add the corresponding 
S p as a new member of S d . 

iv) Save e^ :P to update its value in the next itera- 
tion. 

v) Use e^ :P and e y p to compute the update 
e y , P +i CLZ]i for all S p+1 = [S p a,-],Vi | a; ^ 
Sp. 

vi) Compute the dominant support selection met- 
rics for all these combinations S p +i using the 
computed e y p+ i. This yields (p + l)-element 
metrics i/(iS p +i) to be used in the next iteration. 

b) end p 

4) S d contains the dominant supports that will be used to 
find the sum in ( fT3b . 



